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A Comparison of the ANCILLES and LOGIST 
Parameter Estimation Procedures for the Three-Parameter 
Logistic Model Using Goodness of Fit as a Criterion 

Due to the growing use of latent trait models and the wide range of 
applications of these models (see the Journal of Educational Measurement , 
Summer, 1977), it has become important to investigate the properties of 
the numerous procedures that are available for estimating the parameters of 
the models. There are a number of different models in current use (e.g., 
one-, two-, and three-parameter logistic; graded response; nominal response), 
and for many of these models item parameters can be estimated in several ways. 
While there has been some research done to i /estigate the differences be- 
tween the models (Reckase, 1977; Yen, in pr^ss; Divgi , 1980; Urry, 1970, i977a), 
little has been done to compare estimation procedures for given model. 

One commonly used latent trait model is the three-parameter logistic 
(3PL) model. There are at least three estimation procedures available for 
the 3PL model, each based on a different computer program. For example, the 
ANCILLES (Urry, 1978), OGIVIA (Urry, 1977b), and LOGIST (Wood, Wingersky, 
and Lord, 1976) programs are all designed to estimate parameters for the 3PL 
model. Very little has been done to study the differences in these three 
procedures. Although they are based on the same model, the methods that these 
programs employ to estimate the parameters for the model are quite different 
(the differences between ANCILLES and OGIVIA are not as great as the differ- 
ences between LOGIST and the others). The few studies that have dealt with 
the differences in these procedures have primarily been concerned with the 
ability of the procedures to faithfully reproduce true item and ability para- 
meters. For instance, in a simulation study conducted by Ree (1979), three 
groups of 2,000 subjects were simulated, and the simulated responses were 
calibrated using the ANCILLES, OGIVIA, and LOGIST procedures. The estimated 
parameters were compared to the true parameters, the estimated true scores, 
and an information comparison was made. It was concluded that the selection 
of an item calibration program should be dependent on the distribution of 
ability in the calibration sample, the intended use of the parameter esti- 
mates, and computer resources available. Specifically, the differences that 
were found included the finding that LOGIST performed best for rectangular 
ability distributions and OGIVIA performed best for normally distributed abil- 
ity groups. Also, LOGIST was more expensive to run, but the OGIVIA and ANCILLES 
proyrarns did not always give estimates for every item. 

The Ree study indicated that there were differences in the quality of 
the parameter estimates given by the procedures considered, and the conclusions 
provided guidelines for selecting procedures for the model. This type of 
study is useful, and should be extended to include other models, but there 
are other comparisons that should be made. One important comparison that 
was not made was a comparison of the procedures using the fit of the model 
to the data as a criterion, an important factor when considering the quality 
of parameter estimation using the procedures. The purpose of this study, 
then, is to extend the comparison of the 3PL parameter estimation procedures 
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to include a comparison of the fit of the 3PL model to real data when using 
the different procedures. Before reporting the present study, however, a 
discussion of the model and procedures, as well as the fit statistics used, 
will be given. 



The Model and Procedures 

The model that was employed in this study was the three-parameter log- 
istic model presented by Birnbaum (1968). The model requires three para- 
meters for each item and one ability parameter for each examinee. The model 
is given by 



^ , ^ , ^ exp(Da.(e. - b.)) 

Pi(eJ = c. + (1 - c.) '-^ 

^ 1 + exp(Da.(e. - b.)) 

1 J 1 



(1) 



where ti. is the ability parameter for Examinee j, a. is ,.ie item discrim- 

J 1 

i nation parameter, b^. is the item difficulty parameter, c^. is the item 
guessing parameter, P^{Q^) is the probability of a correct response to Item i, 
and D is a scaling constant equal to 1.7. 

There are three commonly used programs for the estimation of the para- 
meters of the 3PL model, — ANCILLES, LOGIST, and OGIVIA— but because 
ANCILLES is a newer versi'^n of OGIVIA, OGIVIA was not included in this study. 
The ANCILLES estimation procedure is a two-staged procedure. In the first 
stage raw scores, corrected to exclude scores on the item being calibrated, 
are used as a measure of manifest ability. Using the correct raw scores 
the program computes item characteristic curves (ICC's) for various sets of 
guessing, discrimination, and difficulty values. The proportions of exam- 
inees falling within set intervals of the manifest ability who passed the 
item are computed, and those values are compared to the generated ICC's. 
Chi-square fit statistics are computed for each ICC, and the set of values 
with the minimum chi-square is selected. This procedure is repeated for all 
the items to be calibrated. Then a second stage is begun, in which unregressed 
Bayesian modal estimates (UBME's) are used as manifest ability in place of 
raw scores. This substitution is made because the UBME's more closely ap- 
proximate the latent ability distribution. Using the UBME's ancillary esti- 
mates of the item parameters are made. 

The LOGIST proce jrt on the other hand, uses neither Bayesian moda"; 
ability estimation nor minimum chi-square item parameter estimation procedures. 
Rather, LOGIST uses maximum likelihood estimation for estimating both ability 
and item parameters. Initial values for the item parameter estimates are 
set, and ability estimates are computed for all the examinees using maximum 
likelihood estimation. Then the ability estimates are held fixed, and new 
estimates are made- for the aj- and b^values, again using maximum likelihood 
estimation. These two steps, called a stage, are repeated a number of times 
with the c-values held fixed. 
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,-r. " V •'^^^ ^^^3es the c-values are allowed to vary, but change 

c?.iif K'^-J"" restricted. The procedure cycles through as many 

flJ!^L ^° converge. Convergence is reached when the dif- 

cllcHlltioi? estimates for successive stages is less than errors of 

While this is not a complete discussion of how these two procedures od- 
tltl^l^^ll 5^'^ treatment of the ANCILLES and LOGIST procedures 

^.tl /■^^J^l-^'' ''^ ^^^'^^ parameters are estimated. For a 

197|- iJd'ur^y ?97r'°" ^^^'^ procedures see: Wood. Wingersky. and Lord. 



Goodness of Fit Statistics 

Whenever a model is used to approximate real data it is important to 
determine the accuracy of the approximation. The failure of a model to 

K^^^H^^^^^/f ""^^^f^ ^1?^ ^^^^ ^^^"1^ inaccuracies in measurements 
based on that model. Goodness of fit of the model to empirical data. then. 
IS clearly an important property to consider when selecting a model It 
ihP^n^^.^tI!!!S°lJ^"^ "H^? considering which procedure to use for estimating 
J^fonf ^1 parameter estimates for a model are not 

thl^c!;. S f ^ !r^"^ procedures may result in different estimates for 

^^fn o?^ho^*'^- ^^i^^erent sets of estimates fit the data equally well, 
then either procedure may be appropriate. However, if the two sets of es- 
Jjr?f the dat;* equally well, the procedure yielding the best 
fit IS the more desirable procedure. » ^ u«<. 

tho fJ? ^^^^ ? 2""^-'^ °! statistical goodness of fit tests for gauging 
the fit of a model to data have been proposed. Generally, most of these 

r-V^^ computing statistics that fall in a chi-square or an approxi- 
mate chi -square distribution. For instance, a fit statistic ^or the IPL 
model proposed by Wright and Panchapakesan (1269) involves dividing examin- 

r^nn??nn^;^h"''\^"°J'"^.^° number-right scores, and for each score group 
computing the observed and expected proportions of exaiinees passing the 
Item, with the expected proportion being computed from the model. From 
these proportions a fit statistic is computed with the following formula: 

T N.(0. . - E. 

- i ' (2) 

where 0.^ ir; the observed proportion passing Item i in Score Group j. E. . 
is the proportion predicted by the model, and the sunmation is over Jl/^ 
score groups for .,'<ch the number of examinees in the group is not zero 
HahtTn^I°?.°' ;-'I^^^-^^-ght score groups can be usid since She nu^er- 

?hl? i' Pach %rr' "o'n"' '^^'^^^^ estimating C for the IPL model, 
inay i^, eacn sec. t -oup contains examinees with the sai?e t This stat- 
istic IS essentially the summation of squared z-scores. Wright and 
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Panchapakesan (1969) state that this statistic has J-1 degrees of freedom, 
where / 0. A variation on this statistic used by Rentz and Bashaw (1975), 

involves computing the above and then dividing it by the nunter of score 
groups for which the number of examinees in the group is not zero, obtain- 
ing as a result a 'mean square' fit statistic. 

A procedure not limited to the IPL model was proposed by Yen (in iress). 
This statistic differs from the Wright and Panchapakesan statistic in that 
examinees are not grouped by number-right scores. Rather, examinee*, are 
ordered according to their ability estimates. The range of ability esti- 
mates is then divided into categories (Yen suggests 10), and the observed 
and expected proportions are computed for those categories. This fit stat- 
istic is given by 

,a . 1° - 'i/ (3) 

j-1 E,j(l-E,.j) 

where C.^ and E^.^. are as defined previously. Since the categories for this 

statistic are not based on number-right scores this statistic is not lifr.i- 
ted to the IPL model. Yen (in press) suggest that this statistic has iO-m 
degrees of freedom, where m is the number of item parameters estimated. 

A similar statistic, s, was suggested by Wright and Mead (1977). This 
statistic is given by 

J N.(0. . - E. .)^ 
s = 1 Z J ""J 

where 0^.^. and E^.^ are as defined above and a^p^ is the variance within cate- 
gory j of thi predicted proportions passing the itam (Yen, in press). 
Wright and Mead suggest the addition of the a^p. term because examinees 

within a category do not have the same 6, and the addition of the term pro- 
vides a more accurate estimate of the variance of 0^.^ than does the denom- 
inator in Equation 3. For this statistic examinees are grouped in the same 
way as for the Yen statistic. However, Wright and Mead suggest six or 
fewer categories, rather than the 10 suggested oy Yen. The constant 1/J 
provides a mean fit statistic for the J cateogires. 

One statistic for measuring goodness of fit of a model to data that is 
not based on the chi-square distribution is the mean square deviation (MSD) 
statistic proposed by Reckase (1977). The MSD statistic is given by 
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MSD. = 



where u^.^ is the response to Item i by Examinee j, P^.^ is the probability 

of a correct response as given by the model, and N is the number of exam- 
inees. The purpose of this statistic is to avoid the differences caused 
by different interval sizes encountered with the statistics described 
above. Reckase suggests that, even though the sampling distribution of the 
statistic in unknown, hypotheses can still be tested because only compara- 
tive information is of interest. Thus, differences in MSD statistics obtained 
for different procedures for a single set of items can be tested using 
analysis of variance procedures (or, in the case of two procedures, a simple 
dependent jt-test). Because the statistic does not group examinees, its use 
is not limited to a single model. 

For the present study the MSD statistic and the chi -square statistic 
suggested by Y.^n were selected. Since the present study is concerned with 
procedures for estimating the parameters of the 3PL model, those fit stat- 
istics based on the number-right score groups are clearly inappropriate. In 
a comparison of the Yen statistic and the statistic proposed by Wright and 
Mead. Yen (in press) found virtually no difference in the. two statistics. 
Yen concluded that using 10 categories was sufficient to produce small enough 
values of a would be sufficiently small so as to make it unnecessary to 

adjust the denominator in the chi-square statistic. Because of the concern 
over the differences the category sizes make in the chi-square statistic, 
the MSD statistic was included in the analyses. 

Analyses for the current study, then, include the comparison of the chi- 
squares obtained for the two procedures using the statistic proposed by Yen, 
and a comparison of the MSD statistics obtained for the two procedures. In 
addition, direct comparisons of the obtained parameter estimates will be 
mad'i. These comparisons will include descriptive statistics and correlations 
of the distributions of ability and item parameter estimates obtained from 
the ANCILLES and LOGIST programs, as well as plots of the observed propor- 
tions of examinees passing an item with the proportions predicted by the 
model using the estimates from the procedures. 
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Method 

Test Data 

The data-set used for this study was constructed from a 4,000 case sample 
of the Iowa Test of Educational Development (ITED). The test items were a 
stratified random sample of 50 items from the various subtests of the ITED. 
Response data for 1,999 examinees were sampled from among four qrade levels. 



Analyses 

To begi.i the analyses » ability and item parameter estimates for the 3PL 
model were obtained by running both the ANCILLES and LOGIST calibration 
programs on the response data. For each set of estimates obtained chi -squares 
were computed for each of the items using the following procedure. First the 
range of ability estimates was divided into 49 categories of .1 width (the 
end categories were larger so as to keep all cell frequencies > 5). Exam- 
inees were grouped, then, according to which category their abTlity estimates 
were in. For each category both the proportion of examinees in that category 
passing the item and the proportion failing the item were obtained. Also, 
for each category the expected proportion passing and the expected proportion 
failing the tern were computed. The expected proportion passing an item, as 
predicted b> the 3PL model > is 



E.. = c. Ml - c.) exp(1.7a.(9. - b.)) 

1 + exp(1.7a.(e^. - b.)) (6) 

where E- - is the proportion of examinees in Category j expected to pass Item i, 

is the midpoint of Category j, and the other parameters are as defined for 

Equation 1. It should be noted at this point that, due to the small category 
size, the variance of the expected proportions was quite small. For the pur- 
poses of this study, then, the expected proportions were assumed to be constant 
within a category. That is, the variance of the expected proportions is equal 
to zero. 

Once the observed and expected proportions were obtained for both sets 
of parameter estimates, then chi -square statistics for each item, using both 
sets of estimates, were computed usinq Equation 3 (with the modification that 
48 categories, were used instead of 10). Using these chi-squares a number of 
analyses were performed. First, the chi -square values were compared to the 
critical value to determine whether they were significant. Then a comparison 
was made to determine which procedure resulted in lack of fit for more items. 
Then the chi-squares for each procedure were summed and the resulting chi- 
squares were tested for significant lack of fit for the test as a whole. Fur- 
ther analysis included performing a binomial test to determine whether the 
chi-squares obtained for one procedure were larger than the chi-squares obtained 
for the other procedure more times than would be expected by chance. Two final 
analyses using the chi-squares involved the graphic presentation of the obtained 
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values One analysis involved plotting, for each category, the observed and 
expected proportions passing the item. This was essentially a visual comp- 
arison of empirical and theoretical ICC's for each item. Plots were made for 
both procedures. The last analysis performed with the chi-squares for each 
procedure was the plotting of the obtained distribution of chi-squares with 
the actual distribution of chi-squares computed from the chi-square probability 
density functior. i r 

A set of analyses was also performed using the MSD statistic set out in 
fcquation 5. For each set of estimates MSD statistics were computed for each 
Item. The resulting statistics were testPd for significant differences usinq 
a dependent t^-test. 

A final set of analyses involved the direct comparison of the parameter 
estimates obtained from the ANCILLES and LOGIST Procedures. The analysis in- 
cluded a comparison of the shape of the distributions of the ability and item 
parameter estimates, as well as correlations of the two sets of estimates 



Results 

Chi -Square Analyses 

The item chi-square statistics obtained for the ANCILLES and LOGIST 
fJ??f?cr^! presented in Table 1. Item 1 and Item 9 were deleted by 
ANCILLES during calibration. Comparison of these values to the critical value 
required .or significance at a = .05 revealed that significant lack of fit 

?h''"rnr?/x°''^'^^!^" '^^^ ANCILLES procedure, and for six items for 

the LOGIST procedure. Although it is true that such a multiple comparison 
increases the probability of finding significant results, the intent is to 
compare the two procedures rather than to make an evaluation of the proced- 
ures across items. Therefore the alpha level was not adjusted to acconmodate 
the multiple comparison. A test for the significance of the difference between 
two correlated proportions (Ferguson. 1976) yielded a z = 2.68. indicating 

higher proportion of items showed lack of fit for the 
ANCILLEo procedure than for the LOGIST procedure (£ < .05). 

^?,°r.^f}tf^^u^ results reported above it is somewhat surprising that 
the ANCILLES chi-square values are not larger than the LOGIST chi-square 
values for significantly more than half the items. The ANCILLES chi-square 
ltru^ll ^^"^Q^^than the LOGIST chi-square value for only 25 items, and the 
ANCILLES mean chi-square was not significantly larger than the mean chi- 
square value for LOGIST (58.12 for ANCILLES and 52.44 for LOGIST) It would 
fS???!^* J-^"' ANCILLES chi-square values were not larger than the 

LOGIST chi-square values more often than would be expected by chance, but 
when they were larger than the LOGIST values, they tended to be significant 
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ANCILLES vs. LOGIST Goodness of Fit Coniparison 
Using Yen's Chi -Square Statistic 



Item 


ANCILLES 


LOGIST 


! 




46.65 




70.36* 


50.88 


3 


58.95 


48.49 




39.95 


43.73 


^- 


1 16.45* 


46.57 


6 


1 33 . 1 4* 


50.72 


7 


34.00 


41 .97 


8 


41 .83 


61 .64 


9 




46.04 


10 


56.56 


32.90 


1 1 


51.13 


48.37 


12 


46.97 


38.58 


13 


81 .97* 


53.90 


14 


35.38 


59.62 


15 


51 .01 


60.64 


16 


61 .68* 


52.02 


17 


75.22* 


62.06* 


18 


62.22* 


44.90 


19 


50.15 


35.13 


20 


36.33' 


53.92 


21 


50.91 


58.81 


22 


57.51 


69.78* 


23 


104.66* 


80.91* 


24 


45.96 


46.93 


25 


48.84 


48.26 


26 


52.44 


55.91 


27 


93.87* 


93.96* 


28 


57.10 


56. 14 


29 


76.61* 


51 .09 


30 


43.76 


52.43 


31 


50.85 


49.82 


32 


33.92 


45.20 


33 


65.78* 


58. 18 


34 


52.86 


70.37* 


35 


58.27 


60.90 


36 


41 .66 


44.55 


37 


55.20 


47.50 


58 


50.97 


51 .54 


39 


34.49 


44.98 


40 


50.54 


57.05 


41 


46.54 


47.28 


42 


72.42* 


71 .08* 


43 


46.98 


43.29 


44 


70.49* 


39.81 


45 


62.56* 


49.21 


46 


67.57* 


50.89 


47 


28.85 


38.22 


48 


45.55 


46.20 


49 


55.85 


56.35 


50 


53.39 


44.24 



2 

Note . The critical value for rejection of adequate fit is x ^45) > 61.66 
at a = .05. 

« significant at .05 level. 1 - 
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Another anr.lysis that was performed on the chi -square values obtained 
for the ANCILLES and LOGIST procedures was the summation of the chi -squares 
over Items to test whether there was significant lack of fit for the test as 
£ whole. Using the normal approximation to the chi -square distribution yields 
a standard deviation of 66. The ANCILLES chi -squares sumned to 2789, which 
yielded a _2 = 9.54. The LOGIST chi-squares sumned to 2517, which resulted 
in 2^= 5.41. Comparing these _2-score values to the standard normal distri- 
bution, clearly both summed chi-squares were significant, indicating that 
there was significant lack of fit for the test as a whole for both procedures. 

The final analyses performed on the obtained chi -square values involved 
comparing the chi-square values to a graphic display of the empirical and 
theoretical plots of the item characteristic curves. Fioure 1 through Figure 
48 show the obtained and predicted proportions correct for each item plotted 
against the ability estimates. Plots were made for both the ANCILLES and 
LOGIST parameter estimates. Examining these figures closely does reveal one 
consistent pattern across items. The poorest fit for both procedures occurs 
at the lower end of the ability scale. This .s not surprising since it was 
already known that the lower asymptote of the ICC is difficult to estimate. 
It should be noted, however, that the values at the lower end of the ability 
scale are somewhat distorted due to the collapsing of categories that was re- 
quired for the chi-square procedure. In order to keep category frequencies 
above five, the collapsing of end categories was necessary, which resulted 
in some category frequencies that were relatively large due to the width of 
the category. 

Using a visual comparison of the plots for the two procedures, it is 
difficult to determine whether the fit of one procedure was any better than 
the fit for the other procedure. It is also difficult to predict from the 
plots for which items lack of fit was significant. For example, the ANCILLES 
chi-square value for Item 6 was 133.14, while the LOGIST chi-square value 
for Item 6 was 50.72. The plots for Item 6, shown in Figure 5, do not at 
first indicate the large difference in fit. However, closer investigation 
does yield some insight as to cause of the difference in fit for that item. 
The intervals for the ANCILLES procedures showing the largest discrepancy 
between the observed proportion correct and the expected proportion correct 
are those intervals containing the greatest number of examinees. For in- 
stance, the intervals between 6 = 1.0 and 9 = 2.0 show a fair amount of dis- 
crepancy between the observed and expected proportions correct. In those 
intervals frequencies vary ^rom 60 to 90 examinees, (see Figure 51). For 
the LOGIST procedure the poorest fit appears to occur near 6 = 2 0 and 6 = -2 0 
Frequencies in those intervals range from 10 to 20 examinees, which is far 
Tower than the frequencies in the intervals where the ANCILLES procedure 
showed poor fit. This was not a consistent pattern across items, however. 

Figure 15 shows the plots for Item 17. Both procedures showed lack 
of fit for Item 17, and it appears from the plots that the poorest fit was 
in the same ability ranges for both procedures. For Item 23, shown in 
Figure 21, the ANCILLES procedure shows lack of fit in approximately the same 
ability ranges as in other items discussed, but the LOGIST procedure appears 
to fit poorly across the entire ability range. The plots, then, do not ap- 
pear to indicate any other consistent pattern for the procedures; 
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FIGURE 37 
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FIGURE 38 

PLOTS OF EMPIRICAL AND 
THEORETICflL CURVES BASED ON 
ANCILLES AND LOGIST PROGRAMS 
ITEM 40 



o 



cr 

GO 
O 
CO 
Q-to 



eiirffiicAt - X 



ANCILLES 




00 



1 >t — X>C) i W |X 



-2.00 0,00 2.00 

INTERVAL MIDPOINTS 



14.00 



6.00 



LOGIST 



—JO 
CO 

o 
en 

O-io 



o 
o 




-6.00 -U.OO -2.00 0,00 2.00 

INTERVAL MIDPOINTS 



u.oo 



6.00 



52 



CNPIHICAL « X 
TMCOnCTlCflL « 



-48- 

FIGURE 39 

PLOTS OF EMPIRICAL AND 
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FIGURE 40 

PLOTS OF EMPIRICAL AND 
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FIGURE m 
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FIGURE U2 

PLOTS OF EMPIRICRL AND 
THEORETICRL CURVES BASED ON 
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FIGURE 43 

PLOTS OF EMPIRICAL RND 
THEORETICPL CURVES BASED ON 
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FIGUPE 4U 
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FIGURE 45 
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FIGURE 46 

PLOTS OF EMPIRICAL fiND 
THEORETICAL CURVES BASED ON 
ANCILLES AND LCGIST PROGRAMS 
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FIGURE 48 
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Another analysis performed on the obtained chi -square values was to 
plot the distributions of chi -squares obtained for the ANCILLES and LOGIST 
procedures against the theoretical chi -square distribution for 45 degrees 
of freedom. These plots are shown in Figure 49 and Figure 50 for the ANCILLES 
and LOGIST procedures, respectively. From these plots it is clear that the 
chi -squares obtained for the ANCILLES procedure were shifted to the right 
from the expected distribution. The LOGIST chi-square distribution was also 
shifted somewhat to the right, but not nearly so much as the ANCILLES chi- 
squa res . 

One final analysis performed on the chi-square values was to perform 
a chi-square test of independence for the two procedures. That is, using 
the obtained chi-square values, items were classified as fitting or nonf it- 
ting for each of the two procedures. A chi-square test was then performed 
to test whether the classification using chi-squares for ANCILLES was indepen- 
dent of classification using the LOGIST chi-squares. A chi-square value of 
3.43 was obtained. The critical value for a = .05 was x^(l) = 3.84, so the 
hypothesis of independence was not rejected. There was apparently no asso- 
ciation in the items categorized as fitting or nonfitting between the two 
methods of classification. This result was supported by the results of a 
test for the significance of a coefficient of agreement. A kappa coefficient 
(Cohen, 1960) was computed on the chi-square classifications, and the kappa 
was then converted to a z^-score. A kappa equal to .228 was obtained, and a 
z = 1.2 resulted from dividing the kappa coefficient by its standard error 
of measurement (a. = .19). The null hypothesis of no agreement was not re- 
jected. ^ 



MSP Statistics 

The MSD statistics obtained for the two procedures are displayed in 
Table 2. The dependent t^test performed on these values showed the mean 
ANCILLES MSD value to be significantly higher than the mean LOGIST MSD value 

< .05). However, a comparison of Table 2 with Table 1 indicates that there 
is no apparent relationship between the size of the chi-square values and 
the MSD statistics obtained for the items for either procedure. A Pearson 
product moment correlation was computed for the MSD and chi-square values 
and the correlations for both the LOGIST and ANCILLES procedures were found 
to be not significantly different from zero (r = .12 for ANCILLES and r = .19 
for LOGIST). 
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FIGURE 49 
OBSERVED DISTRIBUTION OF 
CHI SQUARES FOR flNCILLES 
WITH EXPECTED CHI 
SQUARE DISTRIBUTION 
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FIGURE 50 
OBSERVED DISTRIBUTION OF 
CHI SQUARES FOR LOGIST 
WITH EXPECTED CHI 
SQUARE DISTRIBUTION 



CD 



GC 
O 




-so- 



Table 2 
ANCILLES vs. LOGIST 
Goodness of Fit Comparison 
Using the MSD Statistic 



Item ANCILLES MSD LOGIST MSD 



1 




_ ^ _ 




233 


2 




.202 






3 




. 179 




.I/O 


4 




. 186 




• lOO 


5 




. 158 






6 




.160 






7 




.178 




• I/O 


8 




.212 




• b ± o 


9 








. 191 


10 




.209 




.208 


11 




.225 




.226 


12 




.181 




. 179 


13 




.195 




.195 


14 




.215 




.215 


15 




.156 




.159 


16 




.191 




.192 


17 




.209 




.210 


18 




.220 




.220 


19 




.185 




184 


20 




.194 




194 


21 




.222 




• ceo 


22 




.201 




. cu o 


23 




.192 






24 




228 






25 




.206 






26 




. 191 






27 




.207 




• C\JO 


28 




.209 




.209 


29 




.220 




220 


30 




.199 




.199 


31 




.201 




.203 


32 




.202 




^202 


33 




.161 




.154 


34 




.213 




.213 


35 




.197 




.196 


36 




.181 




. 182 


37 




.185 




182 


38 




.155 




.150 


39 




.215 




.214 


40 




.211 




.212 


41 




.217 




218 

m ± fO 


42 




.190 




.191 






. 161 




.159 


44 




.113 




.102 


45 




. 167 




.160 


46 




.166 




.157 


47 




.207 




.208 


48 




.200 




.198 


49 




.216 




.217 


50 




.204 




.207 


x" 


t(47) = 2.15 


.194 


(E.< .05) 


.193 



Note: The critical value of t(47) = 2.014 for a. = .05. 
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Parameter Estimate Distribution Analyses 

Item Para meter Estimates The item parameter estimates obtained from 
ANCILLES and LOGIST are shown in Table 3. The correlations of the two sets 
of estimates are displayed in Table 4. Because the origin and unit of meas- 
urement used for the ability and item parameter estimates are arbitrary, the 
scales L-,ed for the two sets of estimates are different. Therefore, to facili- 
tate this comparison the ANCILLES estimates were put on the same scale as the 
LOGIST estimates using procedures set out by Marco (1977). The scaled ANCILLES 
a- and b-values are presented in Table 5. Scaling does not alter the c-values. 
The values obtained for the a- and b-values were similar, with the a-values 
having a correlation of r = .85, and the ^-values having a correlatTon of 
r = .97. The £-values were less similar, having a correlation of £ = .51. 

The distributions of the item parameter estimates obtained from LOGIST 
and the scaled ANCILLES estimates are described in Table 6. Although the ob- 
tained estimates were highly correlated the statistics shown in Table 6 in- 
dicate that there were differences in the item parameter estimate distributions 
The a-value distributions appear quite similar. However, a dependent t-test 
indicated that the mean ANCILLES a-value (.53) was significantly lower~than 
the mean LOGIST a-value (.61), yielding a t = 3.91 (p < .01). A test for 
the significance of the difference between correlated variances (Ferguson, 
1976) yielded a t = 8.68, indicating that the variance of the LOGIST a-values 
were significantly greater than the variance of the ANCILLES a-values tp < 01) 
Whenever variances were found to -nequal in this study, means were tested 
for significant differences using correction in the degrees of freedom 
set out by Welch (1938). A test - whether the obtained kurtosis values 
(-.85 for LOGIST, -.72 for ANCILLES) were significantly different from zero 
CSnedecor and Cochran, 1967) indicated that neither value was significant, as 
was the case with a test for skewness (Snedecor and Cochran, 1967). 

A dependent t_-test applied to the ^-value means (-.06 for ANCILLES, -.34 
for LOGISTy yielded a t = 1.97, indicating that mean ANCILLES b-value was 
greater than the mean LOGIST ^-value (£ < .05). A test for the significance 
of the difference between correlated variances yielded a t = 6.63, indicating 
that the variance of the LOGIST b^-values (£ < .01). The greater variance of 
the LOGIST b^-values becomes more evident when the range of values is consid- 
ered. The scaled ANCILLES b^-values ranged only from -2.88 to 2.34 ( a range 
of 8.34). The kurtosis value for LOGIST (12.21) was significant (p < 01) 
while the kurtosis for ANCILLES (.65) was not. However, the LOGIST b-val ues 
were significantly negatively skewed (p < .01) indicating that, although LOGIST 
b-values go much lower than did ANCILLES, the bulk of the LOGIST b-values were 
actually above the mean of -.34. The ANCILLES b-values were not Figni ficantly 
skewed. — » j 
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Table 3 

ANCILLES and LOGIST Item Parameter Estimates 



ANCILLES LOGIST 
Item No. 





a. 
1 


1 


1 


i 


b. 

1 


1 


1 








. 1 ^ 


-1 .66 


.04 


2 


.80 


1 .26 

1 . ^w 


17 


ftA 
. o*f 


Q"7 


1 o 


3 


.85 


1 22 


j 1 




. y4 


.06 


4 


.98 


04 

. W"T 


1 4 


1 oo 
1 . uu 


. uy 


. 1 / 


5 


.51 


-1 92 


• VJw 


9Q 


-Z . o4 


.04 


6 


.48 


-2 13 


• vw 


9A 


X A"? 

— 3 . U / 


.04 


7 


.77 


. W 7 




Q9 


A^ 

. 04 


.04 


8 


.69 


1 7 


r 5 


47 


— . Z3 


HA 

. U4 


9 








OQ 


"7 n"7 


A>l 

. U4 


10 


.37 


-1 37 


. w^ 


9R 


1 "7A 


. 04 


1 1 


.47 


on 


. u*t 


. 


A Q 

— . Uo 


f^A 

.04 


12 


92 


. / o 


HQ 


OO 


Al 
.01 


• 04 


13 


.57 


- 75 


. uo 


40 


i 1 I 
- 1 . M 


. 04 


14 


54 




n 1 


AA 


-•ZD 


.04 


15 


1 14 


- 40 


1 7 


. Oo 


- . rU 


riA 
.04 


16 


76 






. O 1 


- • 3Z 


.04 


17 


58 






Af^ 
. *tw 


TA 

-.3D 


. 04 


18 


40 


- 74 




"^O 


QQ 

. oy 


.U4 


19 


86 


. ^w 




QA 
. OO 


.4 / 


.04 


20 


86 

. WW 


4ft 




7R 


• 33 


A/* 

.U4 


21 


.55 


i 6 

. 1 w 


i 6 


. .^w 


— . 3 / 


A>l 
. U4 


22 


50 

■ . W 


- 7ft 


Oft 
. uo 


TT 

. 33 


— 1 . ZO 


A/l 

.U4 


23 


44 


1 • *TW 






1 QO 

- 1 . yz 


Ayi 
.04 


24 


.40 


- 74 




. 3 1 


OQ 

— • zy 


A/l 

. U4 


25 


-99 


. w^ 


• 


1 OA 
1 . UO 


RR 


OA 
. ZU 


26 


89 

• w ^ 


25 


HQ 


m /4 


1 1 
. I f 


. 04 


27 


.69 


. 1 w 


Oft 


R4 


AA 
. UU 


A/l 

. U4 


28 


46 

. •TW 


- ft7 


09 


• 33 


1 1 o 


A /I 

.04 


29 




- HQ 




. 4 1 


-. 14 


A A 

.04 


30 


74 


1 6 
. 1 w 


. 


AA 
. OO 


1 1 
. 1 I 


A A 

.04 


31 


. w*t 


_ "^7 


n9 


AQ 


- .4o 


A /I 

.04 


32 


. w^ 


. H *♦ 


n 1 

. U 1 


A*? 

. oz 


.4/ 


A/l 

.04 


33 


Q7 


ft 1 




1 1 7 
1.13 


.03 


A 1 

.0 1 


34 




— . *f u 


n9 


A i 

• 4 1 


A Q 


A A 

.04 


35 


« w 1 


. w.^ 




AR 
. 03 


A/l 

. 04 


A/l 

.U4 


36 


85 




n9 


. OO 


- .3o 


A/l 

.U4 


37 


92 


* w^ 


HQ 


. Oo 


A P 
.4o 


A/l 

.U4 


38 


87 

• w / 


QO 


09 


1 r\A 

1 • U4 


7 T 
• / 3 


AA 

. UU 


39 


.57 


55 


07 


4Q 


44 


A4 
• U4 


40 


59 


• 1 w 


0 1 


4ft 
. *fo 


— . 1 O 


A4 
• U4 


41 


.53 


- 07 

• w / 


0 I 




A7 
— . U / 


. J4 


42 


.61 


- 78 

■ ' w 


09 




i A7 
— i • U^ 


A4 
. U4 


43 


1 . 1 1 


37 


04 


1 1 9 


"^A 
. 3U 


A 1 
. U 1 


44 


.72 


1 . 90 


03 

. W— ' 


1 1 7 
1.1/ 


\ 97 


AO 
• UU 


45 


.85 


1 .20 


.08 


1 .08 


.94 


.06 


46 


.92 


.97 


.07 


1 .18 


.77 


.04 


47 


.71 


. 14 


.i i 


.53 


-. 1 1 


.04 


48 


.70 


.89 


.10 


.63 


.69 


.04 


49 


.53 


-.26 


.01 


.41 


-.31 


.04 


50 


.72 


-.01 


.20 


.44 


-.62 


.04 



Note . ANCILLES deleted items I and 9 during calibration. 
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Table 4 



ANCILLES and LOr,IST Item Parameter 
Estimate Correlations 



LOGIST 




ANCiLLES 






a 


b 


c 


a 


.85 


.79 


.25 


b 


.56 


.97 


.14 


c 


.22 


.07 


.51 



Note: Sample size for both ANCILLES and LOGIST is n = 48. 



There were some differences in the distributions of c-values, with the 
mean ANCILLES c-value significantly higher than the mean LOGIST c-value. 
However, the actual obtained £-values for the two procedures did~not differ 
greatly in magnitude. For instance, a difference in mean c-values of .02. 
although significant .01), does not seem to be a great difference The 

skewness of both distributions (1.14 for ANCILLES, 3.34 for LOGIST) was sig- 
nificant (£< .01 for both), but the ANCILLES c-value kurtosis (.60) was not 
significant, while the kurtosis for LOGIST (13.07) was significant (£ < 01) 



When the item parameter estimates obtained from LOGIST for the two items 
deleted by ANCILLES are dropped and the comparisons are made only on the 48 
Items in common, the descriptive statistics chang? somewhat. The LOGIST 
mean b-value increases to -.17 without those two items, and the b- value 
standard deviation drops to .93. The minimum b-value increases to -3 07 
the skewness changes to -1.224, and the kurtosTs becomes 1.891. Thus, with- 
out those two items the b-value distributions from LOGIST and ANCILLES are 
even more similar. The a-value distributions, however, become slightly less 
fnJ'Ur'" ^"^^ common items are considered. The mean a-value for 

LOGiST becomes .53. This new value slightly increases the diffeFence in the 
two distributions, as does the new kurtosis value of -.96 and the new skew- 
"?^^yi^-]"® °^ '^^^ standard deviation (.28) is slightly close to 

the ANCILlES value, as is the new minimum a_-value of .25. The only changes 
in the LOGIST c-value distribution are to the skewness and kurtosis values 
which become 3.26 and 12.42, respectively. 
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Table 5 

ANCILLES Item Parameters Transformed to the LOGIST Parameter Scale 





a. 
1 


b 


i 









.62 


1.52 


3 


.65 


1.47 


4 


.75 


-.06 


C 


.39 


-2.61 


0 


.37 


-2.88 


7 


.59 


.78 


Q 

O 


.53 


.11 


Q 







1 n 


.28 


-1.90 






-.11 


1 c. 


.71 


.90 


J.O 


.44 


-1.09 


1 A 


.42 


-.39 


1 

13 


.88 


-.63 


lo 


.58 


-.46 


1 7 
1 / 


.45 


-.50 


1 Q 


.31 


-1.08 


1 Q 


.66 


.62 




.66 


.51 


O 1 


.42 


.10 


99 
Cc. 


.38 


-1.13 


oo 


.34 


-2.01 


9/1 

^4 


.31 


-.42 


9C 


.76 


.69 




.68 


.21 


2/ 


.53 


.10 


OO 


.35 


-1.24 




.39 


-.23 


JO 


.57 


.10 


O 1 

Jl 


.49 


-.59 




.48 


.46 


OO 

33 


.75 


.94 


O/l 

34 


.42 


-.63 


oc 
35 


.47 


.71 


o^r 
3o 


.65 


-.50 


3/ 


.71 


.71 


OQ 

3o 


.67 


1.06 


OQ 

39 


.44 


.50 


40 


.45 


-.32 


41 


.41 


-.20 


4^ 


.4/ 


-1.13 


A'} 
4o 


.85 


.37 


^4 


.55 


2.36 




.55 


1 .45 


46 


.71 


1.15 


47 


.55 


.07 


48 


.54 


1.04 


49 


.41 


-.45 


50 


■ .55 


-.13 



Q Note: The transformation does not alter the £-vdlues. 
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Table 6 



ANCILLES and LOGIST Item Parameter Estimate Descriptive Statisti 



Statistic- 



cs 



No. of Items 


48 


Mean 


.51 


Median 


.53 


St. Dev. 


.15 


Minimum 


.29 


Maximum 


.88 


Skewness 


.36 


Kurt OS is 


-.72 



ANCILLES 



48 
-.06 
-.06 
1.06 
-2.88 
2.34 
-.49 
.35 



48 
.06 
.05 
.06 
.01 
.22 
1.14 
.60 



LOGIST 





^■ 


s- 


50 


50 


50 


.61 


-.34 


.04 


.49 


-.14 


.04 


.30 


1.34 


.03 


.08 


-7.07 


.00 


1.18 


1.27 


.20 


.47 


-2.89 


3.34 


-.85 


12.21 


13.07 



Note 



esulltes! ^""'-'•^^ "^'"^ Obtained using transformed item parameter 



to co5;"e%:hres^imate's''oCt':fneS"f^''%S''*''"f'' '""^ ""''■"'^^^ "'I"-'" 

I npTCT • Z ^'gniTicant lack of fit are shown for ANCILLES in TshlP 7 
and LOGIST in Table 8. Examination of these tables does no^ qive anv cle.r 

ih"? i;°?e^:as^:c\^ oT^?t%^^^^j?fL^s°'/^'^- o"?hrfte^:s";or 

ferent from ^ HT. ^ [?emf n^t^^^o^f^g ?ack"o%"??t^ jl^A^N^S^LfEs'^'m^e-an 

there-wlriack of 'm Jnr iS?t\t ?h ^ ^^""^ '^^^^ ^or which 

the ^^^^ oV^^^^^^^ than 

ll'iurfol'r.l X^^^^r:''^'r' --^-^^-^-"^ ?oweVihan^%r ea^rJ-^ 
c-values for tfp Ihol items are significantly different from the 

i^^pa^:L^e?^s^?L^:s" "° ^^S-^^cant differences in the mean? of any of 
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Table 7 



ANCILLES Item P=»rameter Estimates for Items for Which 
There Was Significant Lack of Fit 



Item 




^i 


1 


*^i 


2 




.62 


1.52 


.17 


5 




.39 


-2.61 


.06 


6 




.37 


-2.88 


.06 


13 




.44 


-1.09 


.06 


16 




.58 


-.46 


.02 


17^ 




.45 


-.50 


.01 


18 




.31 


-1.08 


.02 


23^ 




.34 


-2.01 


.03 


27a 




.53 


.10 


.08 


29 




.39 


-.23 


.03 


33 




.75 


.94 


.05 


42a 




.47 


-1.13 


.02 


44 




.55 


2.36 


.03 


45 




.65 


1.45 


.08 


46 




.71 


1.15 


.07 


Lack of 


X 


.50 


-.30 


.05 


Fit 


St. Dev. 


.14 


1.56 


.04 


No Lack 




.55 


.05 


.07 


of Fit 


St. Dev. 


.16 


.73 


.06 



^ Also showed lack of fit for LOGIST. 
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Table 8 

LOGIST Item Parameter Estimates For Items For 
Which There Was Significant Lack of Fit 



Item a. b 



1 -1 



17^ .46 -.36 .04 

22 .33 -1.28 .04 

23^ -29 -1.92 .04 

273 .54 QQ 

34 .41 -.49 .04 

42^ .43 -1.03 .04 



Lack of X .41 -.85 04 

Fit St. Dev. .09 .70 .00 

No Lack 7 .63 _.27 .05 

of Fit St. Dev. .30 1*40 .04 



Also showed lack of fit for ANCILLES. 



Ability Estimates 

The final set of analyses performed involved the comparison of the ability 
estimates obtained from LOGIST with the scaled ANCILLES ability estimates. Des- 
criptive statistics for the two obtained ability estimate distributions are 
presented in Table 9. As can be seen from these statistics the two distributions 
were quite similar. The range of ability estimates for LOGIST was limited by 
boundaries of approximately -4.00 to +4.00. In unrestricted operation LOGIST 
would allow a greater range of ability estimates than would ANCILLES (the same 
tendency can be noted in the range and variance of b^-values). 





Table 9 




ANCILLES and 


LOGIST Ability Estimate 


Descriptive Statistics 


Stati sties 


ANCILLES 


LOGIST 


iNo. of Subjects 


1999 


1999 


Mean 


-.137 


-.137 


Median 


.045 


.142 


St. Dev. 


1.213 


1.214 


Mi n imura 


-4.991 


-4.061 


Maximum 


3.303 


3.432 


Skewness 


-.706 


-1.164 


Kurtosis 


.398 


1.372 



Note: Statistics for ANCILLES were obtained using transformed ability estimates. 
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The frequency distributions of the ANCILLES and LOGIST ability esti« 
mates were plotted together. These frequency distributions are shown in 
Figure 51. As can be seen in the figure, the two distributions are almost 
indistinguishable inside the range of -2.00 to +2.00. The only real discrep- 
ancy between the two c' tributions is the height of the LOGIST curve at about 
-4.00. Because of th arbitrary limits on 6^ LOGIST tends to 'pile up' at 
the limit those examinees whose ability estimates would be outside the limit 
if the limit were not imposed. This accounts for an unusually large number 
of ability estimates at approximately -4.00. The great similarity oetween 
the two sets of ability stimates is reflected in the correlation of the 
ability estimates. The Pearson product-moment correlation coefficient ob- 
tained for the ability estimates was = .987. Clearly there is a strong 
association between the ability estimates assigned by LOGIST and those as- 
signed by ANCILLES. 
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FIGURE 51 
FREQUENCY D I 5T R r BU T I QNS OF 
OBTAINED flBILITT ESTIMATES 
FOR flNCILLES RND LOGIST 
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Discussion 

When using a Pearson statistic to test the goodness of fit of data to 
a model such as the 3PL model, a number of difficulties are encountered. 
Before discussing the results of this study, these problems v/ill be addressed 
and the jnanner in which they were dealt with in this study will be discussed. 

One of the first problems to arise when attempting to compute a chi- 
square statistic such as was used in this study concerns the formation of 
intervals on the ability estimate scale. There appears to be some question 
as to how many intervals to form. For instance. Yen (in press) suggests 
10 intervals, while Wright and Mead (1977) recommend six or fewer intervals. 
The statistic proposed by Wright and Panchapakesan (1S69) would require as 
many intervals as there are obtained number-right scores. Bock (1972), in 
the fit statistic he has proposed, does not set out any requirements as to 
the number of categories, but in the example he sets out in his paper (pp. 44-45) 
he uses 10 intervals. It is clear that the size of the interval will af- 
fect the size of the chi-square obtained for the interval. As the interval 
width increases, the difference between the observed proportions at the ends 
of the interval and the expected proportion at the center of the interval 
can be expected to increase. The objective, then, is to have enough inter- 
vals (making each interval smaller) to produce sufficiently small within- 
interva'i variances in the ability estimates, and thereby reducing wi thin- 
cell variances of the expected proportions. Alternatively, a^p can be com- 
puted and subtracted from the denominator of the chi-square statistic (Wright 
ar^ Mead, 1977). 

In the current study 48 intervals were used. With such a large numbe^ 
of intervals the width of any one interval was sufficiently small as to ob- 
viate the need to correct for the variation in expected proportions. How- 
ever, using such narrow intervals did result in very low frequencies within 
the extreme intervals, with several intervals having frequencies equal to 
Tero. In order to correct for the small frequencies in the extreme inter- 
vals some of the intervals were collapsed together and treated as a single 
category. 

Another problem encountered in applying a chi-square test is the de- 
termination of the appropriate degrees of freedom. The degrees of freedom 
normally associated with the chi-square goodness of fit test when parameters 
are estimated from the data is 

df = r - g - 1 (7) 

where df is the degrees of freedom, r is the number of categories, and g 
is the number of parameters estimated from the data (Daniel, 1978). That 
is, the degrees of freedom are calculated as the number of independent data 
points (observed proportions) minus the number of independent parameters 
estimated from the data to produce the expected proportion (Yen, in press). 
However, whe/; applying the chi-square test to a latent trait model several 
changes are required. First, because the sum of the expected frequencies 
is not held fixed, it doesn't really make sense to subtract one from the 
^ number of categories. Thus there are r independent data points, rather than 
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r - 1 (Yen, in press). For the 3PL model there are four independent para- 
meters (6, a, b, and c) estimated from the data and used in computing the 
expected proportions. The item characteristic curve for an item is fairly 
well defined by the computed observed proportions, and the item parameter 
estimates are clearly dependent on the observed proportions. Therefore, 
one degree of freedom should be subtracted for each item parameter. How- 
ever, the ability estimates obtained were dependent upon the entire re- 
sponse vector, and a given item contributes only a small proportio' -»f the 
information necessary to compute the ability estimates. Therefore » lOr any 
given item the estimation of ability entails little loss in degrees of free- 
dom (Yen, in press). Therefore, it is probably more appropriate to subtract 
g - 1 from the degrees of freedom, rather than g, when using a latent trait 
model. The degrees of freedom used for this study, then, are given by 

df = r - (g - 1) (8) 

where df, r and g are as defined above. 



Chi -Square Analyses 

It is clear from the results of the chl -square analyses that the LOGIST 
procedure performed better in terms of goodness of fit. Neither procedure 
actually fit the test as a whole, but fewer items were rejected when using 
LOGIST. For the LOGIST procedure only twelve percent of the items showed 
lack of fit, while for the ANCILLES procedure over thirty percent of the 
items were rejected for lack of fit. 

It is difficult to determine why the lack of fit was significant for 
ANCILLES more than for LOGIST, especially considering tnat in almost half 
of the cases (23 out of 48) the LOGIST chi -square was larger than the ANCILLES 
chi -square. The plots of the expected and observed proportions correct are 
not very revealing either. However, an examination of the chi -square val- 
ues obtained for each interval, before being summed, does give some insight 
as to the cause of the poor fit. For Items 17 and 27 the LOGIST chi-squares 
were significant due solely to the poor fit in the most positive category, 
as was the case for Items 16, 17, 18, and 27 for ANCILLES. The last cate- 
gory on the positive end was a very wide category, due to collapsing. Be- 
cause of this the computed expected proportion, based on the midpoint of the 
Interval, was too high. For Item 27 of ANCILLES, as well as Items 6, 23, 

33, 42, 44, and 45, the poor fit was concentrated in the intervals above 

e = 1.00. The same was true for Item 23 for LOGIST. For LOGIST, Items 22, 

34, and 42 seemed to fit poorly across the ability range, as was the case 
with ANCILLES for Items 2, 5, 13, and 46. These findings are summarized in 
Table 10. The poor fit at the extreme ends of the ability range was a pro- 
blem with both procedures. The pcor fit in the most positive interval was 
a procedural problem, and those items should probably not be counted among 
those items for which there was significant lack of fit. Without those items 
there was significant lack of fit for four items for LOGIST and 11 items 

for ANCILLES. 
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Table 10 



A Summary of the Ranges of Ability for Which 
Items Showed Poor Fit for ANCILLES and LOGIST 





Procedure 


Last Interval 


Interval where e > +1.0 


ANCILLES 
LOGIST 


16, 17, 18, 27 

17, 27 


6. 23, 27, 33, 42, 44 
45, 23 



MSP Statistic 

An examination of the obtained MSD statistics contributes little toward 
explaining the results. The dependent t-test on these values was signifi- 
cant, which is not consistent with the Finding that the ANCILLES chi -squares 
were not larger than the LOGIST chi -square significantly more than half the 
time. Moreover, it is disturbing that there was apparently no relationship 
between the size of the MSD statistics, obtained for the items and the size 
of the chi -square values for the items. A comparison of the MSD values and 
the item parameter estimates did not yield any clear pattern. 



Item Parameter Estimates 

A comparison of the item parameter estimates obtained from LOGIST and 
the transformed Af'ICILLES estimates also failed to yield a clear explanation. 
For the full set of items the ANCILLES and LOGIST mean b-values were not sig- 
nificantly different. They were also not significantly different for those 
items for which there was lack of fit, nor were they significantly different 
for those items for which there was no lack of fit. For neither procedure 
was the mean b;- value obtained for the items for which there was lack of fit 
different from the mean b-value for the items for which there was not lack 
of fit. 

The mean a^values for ANCILLES and LOGIST were significantly different, 
interestingly enough, however, the mean e -values were not significantly dif- 
ferent when considering only those items for which there was lack of fit, 
nor were they significantly different when considering only the items for 
which there was not lack of fit. The ANCILLES mean a;-value for the items 
for which there was lack of fit was not significantly different from the 
mean ANCILLES a-value for the items for which there was not lack of fit. 
However, for LUGIST the mean a-value for the ite,ns for which there was lack 
of fit was significantly lower than the mean LOGIST a-value for the rest of 
the items. Because LOGIST yielded higher ja-values than ANCILLES. for the full 

er|c ^? 



set of items but not for those items for which there was lack of fit it is 
possible that LOGIST underestimated the a^-values for those items for which 
there was lack of fit. It did appear that LOGIST had more trouble with items 
with lower discrimination values. 

For the full set of items the mean ANCILLES £-value was significantly 
higher* than the LOGSIT mean c_-value. The mean ANCILLES c-value for the itejis 
for which there was not lack of fit was also greater than the mean LOGIST 
c^value for the items for which there was not lack fit. However, when 
considering only those items for which there was lack of fit, the mean £- 
values for the two procedures were not significantly different, indicating 
perhaps that for the items for which there was lack of fit either ANCILLES 
underestimated the ^-values, or LOGIST overestimated the c-value, or both. 
However, for. neither procedure was the mean £-value for tFe items for which 
there was lack of fit significantly different from the mean £-value for the 
rest of the items. 

The comparisons of means discussed above do not yield any clear pattern. 
A comparison of the estimates obtained from ANCILLES and LOGIST with the chi- 
squares obtained for the procedures does indicate a consistent pattern, how- 
ever. While it is true that comparing mean values reveals surprisingly few 
differences in the two sets of item parameter estimates, there is some evi- 
dence that the lack of fit of the ANCILLES procedure is related to the item 
parameter estimates. The correlation of the ANCILLES b-values with the chi- 
squares obtained for ANCILLES is r_ = -.49. When using the absolute value of 
the b-values, that correlation is _r = .68, indicating that the size of the 
chi-square value obtained for ANCILLES was strongly related to the absolute 
magnitude of the corresponding b^value. While the mean ANCILLES b;-value for 
the items for which there was lack of fit was not significantly different 
from the mean for the rest of the items, the variance of the b^-values for 
the items for which there was lack of fit, s_^ = 2.43, was significantly high- 
er than the variance of the b-value of the rest of the items, sf = .53' < .001). 
This indicates that the b_-values for the items for which there was lack of 
fit were more extreme than the b-values of the rest of the items. This dif- 
ference wasn't indicated by the comparison of the means because the extreme 
values were divided between the positive and negative ends, thus cancelling 
themselves out when the mean was computed. This pattern does not occur with 
LOGIST, and the correlation of the LOGIST ch?-squares with the absolute val- 
ues of the LOGIST b-values was = 0.0. It appears, then that at least part 
of the difference Between the fvt of the two procedures is accounted for by 
the poorer ability of ANCILLES to handle extreme b-values. 

The correlations of the obtained chi -squares for the two procedures 
with their respective a- and c-values were not significant. However, a-value 
estimates also appeared" to be a factor in the fit of the LOGIST procedures. 
For instance, for Item 23 the fit of the model to the data for LOGIST was 
poorest at the extremes of the ability range. The a_-value for Item 23 ob- 
tained from LOGIST was a_ = .29, a relatively low discrimination. The aj-values 
for the remaining nonfitting LOGIST items were also low. 

Most of the items for which there was poor fit can be accounted for in 
one of the following ways. For three items for ANCILLES and two items for 
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LOGIST the poor fit was due to a procedural problem. For the remai . ng items 
for LOGIST the poor fit appears be due to the poor handling of low dis- 
crimination values. However, since low discrimination values often indicate 
mul tidimensionality, poor rit would be a desired result in these cases (Reckase 
1978). For nine of the remaining 11 items for ANCILLES for which there was 
lack of fit, the poor fit appeared to be primarily due to the inability of 
ANCILLES to handle extreme difficulty values. For one of the two remaining 
items for ANCILLES, Item 29, the poor fit seemed to be across the ability 
range. Item 29, however, had a low discrimination value, indicating that per- 
haps ANCILLES also does not handle low discriminators well. Fcr Item 33 the 
poor fit of ANCILLES was primarily in the intervals where 6 > ^L.O. 

Ability Estimates 

As was indicated by Table 9, the ability estimate distributions obtained 
from ANCILLES and LOGIST were almost identical. Consider^:ng the similarity 
and the fact that the two sets of ability estimates had ^ correlation of 
r; = .987, it is difficuV' to imagine how the ability estimates could have been 
a factor in the difference in fit for the two procedures. 

Summary and Conclusions 

This study was conducted to determine whether there were qualitative 
differences in the parameter estimates obtained from the ANCILLES and LOGIST 
estimation procedures. The comparison was made using goodness of fit as a 
criterion. The results of this study indicate that there are qualitative 
differences in the estimates obtained from these two procedures. While the 
parameter estimate distributions obtained from these two procedures were 
quite similar, lack of fit occurred for significantly more items for ANCILLES 
than for LOGIST. Further analyses indicated that lack of fit Tor ANCILLLS 
appeared to be strongly related to item difficulty, while for LOGIST lack of 
fit was more closely related to item discrimination. It is true that LOGIST 
is more expensive to use than ANCILLES, but ANCILLES yielded lack of fit 
significantly more often than LOGIST, and did not yield item parameter esti- 
mates for two items. Because of this LOGIST appears to be the procedure of 
choice. 
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